The idea of analyzing the smashy_lcbench and smashy_super datasets is to understand the dependencies between the hyperparameters and the target variable yval, using the implemented tools from the VisHyp package and, most importantly, without the help of any automatic optimization. We want to understand which parameter is important, i.e. has a great influence on the result. We also want to understand which parameter needs a more precise setting and for which parameter the value almost does not matter. We also want to understand the dependencies between the parameters themselves. Finally, we want to compare the results of the two data sets. In the end we want to compare the results of both datasets.
For each dataset, we want to examine the entire data set and the best 20% of the yval values to get a more detailed insight into the configurations of the best results. In the end we will produce an own paramter configuration and show that the acquired knowledge can be used to produce good yval values. Additionally, we will subset out data with the narrowed range per parameter to produce a subset of good yval values. This can be done within a table or visual per PCP.
The procedure will be as follows: First, we find the most important parameters via importance plot. For a fast and inaccurate overview we will use a heatmap. (Inaccurate because it won’t marginalize other parameters out). For deeper insight in the marginal structure as well for dependencies between 2 parameters we then will use Partial Dependence Plots (PDP). Only when the data set has been reduced in size we also can use Parallel Coordiante Plots (PCP) to get a fast and rough impression over the situation.
All plots from the VisHyp package require an mlr3 task object as input. Therefore, an mlr3 task with the selected target is required. For lcbench it is yval, which is a logloss performance measurement. Values close to 0 mean good performance. First, of all we want to know which parameter is important in general.
We need to load packages and subset the data to compare the whole dataset and the dataset with the 20% of configurations with the best outcome. In addition, the data must be manipulated to facilitate the use of the data for summaries and filters.
library(VisHyp)
library(mlr3)
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
smashy_lcbench <- readRDS("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Data/smashy_lcbench.rds")
smashy_lcbench <- as.data.frame(smashy_lcbench)
n <- length(smashy_lcbench)
for (i in 1:n) {
if(is.logical(smashy_lcbench[,i]))
smashy_lcbench[,i] <- as.factor(smashy_lcbench[,i])
if(is.character(smashy_lcbench[,i]))
smashy_lcbench[,i] <- as.factor(smashy_lcbench[,i])
}
task <- TaskRegr$new(id = "task_lcbench", backend = smashy_lcbench, target = "yval")
lcbenchBest <- smashy_lcbench[smashy_lcbench$yval >= quantile(smashy_lcbench$yval, 0.8),]
taskBest <- TaskRegr$new(id = "taskBest", backend = lcbenchBest, target = "yval")
For deeper insights into the analyses of the individual parameters, they can be selected in the Table of Content (TOC) on the left side.
The target parameter yval can reach values between -0.9647 and -0.4690. Our goal is to obtain good results, i.e., to find configurations that produce values close to -0.4690.
The most important parameter is sample. It should always be chosen “bohb” and not “random”, because 2130 of the best configurations 2143 were created with this factor.
The next very important parameter is the survival_fraction. For a good average performance it should be between 0.15 and 0.55. This range was especially important for the “bohblrn” factor of the surrogate_learner parameter, as it is the most important parameter for this subset. We later found that this configuration setting is only correct for the general case. If we are interested in the best configuration space, the parameter should be high. This is especially true for the surrogate_learner bohblrn and knn1.
Even if the surrogate_learner parameter is not so important, it influences most other parameters. This means that other parameters should set values depending on the chosen surrogate_learner if they have a big impact on them. It is also more important for bohb samples. In the general case, the surrogate_learner parameter showed the best performance with knn1 or knn7 and the worst with ranger. An indication that the surrogate_learner parameter has a large impact on other parameters was given by the importance of the subset surrogate_learner datasets We have already seen that the survival_fraction parameter has a large effect on the bohblrn and knn1 factors. We also discussed the impact of surrogate_learner on random_interleave_fraction. In the top cases, we saw that many bohblrn and rangers dropped out in disproportionate numbers. Surprisingly, bohblrn turned out to be the level with the most importance. A small interim conclusion for bohb samples:
bohblrn: random_interleave_fraction better if lower. A good valuer should be between 0.05 and 0.65. survival_fraction lower in the general case. Should be set to a value under 0.5 even it doesnt matter for the best configurations. budget_log_step should be over 0.5
knn1: survival_fraction should get a value over 0.5 if we are interesting in the top cases. For the full dataset the best cases were on average under 0.5 random_interleave_fraction should be low and have a value between 0.05 - 0.5 according to the full data set. budget_log_step should be chosen between -0.5 and 0.5. filter_factor_first should get a value under 4. filter_select_per_tournament should get a value over 0.9
knn7: Filter_factor_First should be under 4 survival_fraction should be between 0.1 and 1 according to both, the full dataset and the subset. budget_log_step produces good performances for values for -0.5 and 1 but has not a big impact in general. random_interleave_fractionshould be between 0.25 and 0.75 according to the full dataset. In the subset it doesn’t matter. random_interleave_random should be FALSE. filter_select_per_tournament should be over 0.5.
ranger: random_interleave_fraction should be over 0.25 survival_fraction under 0.75 budget_log_step should be over -1.5
Another important parameter for the general case is the random_interleave_fraction parameter. We have found that in general low values under 0.3 are better for random samples, and values between 0.1 and 0.75 are better for bohb samples. But this is only the case because it depends on surrogate_learner, and diser has many observations for levels knn1 and knn7. For these levels, a low value must be chosen to get a good result. For the “bohb” sample, values in the middle are better and for “ranger” high values achieve the best yval values. In the general case of random sampling, low values perform better. For the top cases, the parameter lost importance. This could be because the counter case with “random” samples. The level factor did not change the behavior for the top case (for bobhlrn, the middle range is not so important anymore).
The second most important parameter for bohb sampling is the budget_log_step parameter. For the full dataset this parameter should be set between -0.5 and 1.
filter_with_max_budget is not an important in general but should always be set to TRUE and is more important for bohb samples. Anyway, the effect is important for the surrgoate_learner bohblrn in top cases.
filter_factor_first is the most important parameter for the top 20%. It also has a higher importance in random samples than in bohb samples. In general it should be low (under 4) for bohb samples and high (near to 6) for random samples.
filter_factor_last The effect is low and shouldn’t be used to subdivide the data set.
filter_select_per_tournament shouldn’t be too high in general case but doesnt really matter for good results.
filter_algorithm and random_interleave_random have barely an effect and get be let out for deeper investigations.
An overview is obtained. For visual analysis it is important to know the configuration spaces and the class of parameters.
head(smashy_lcbench)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## 1 0.11449875 0.26100298 knn7 FALSE
## 2 -0.42921649 0.33760502 knn7 TRUE
## 3 0.04823162 0.01486055 knn7 TRUE
## 4 0.85378442 0.73223279 bohblrn TRUE
## 5 -1.45588046 0.85519272 knn7 TRUE
## 6 -0.45467437 0.11901165 knn1 FALSE
## filter_factor_first random_interleave_fraction random_interleave_random
## 1 0.2337803 0.2254148 TRUE
## 2 3.7563675 0.1042924 TRUE
## 3 1.0023879 0.5424223 FALSE
## 4 0.4368656 0.4891884 FALSE
## 5 0.6717368 0.5157025 FALSE
## 6 0.7571962 0.7391276 FALSE
## sample filter_factor_last filter_algorithm filter_select_per_tournament
## 1 bohb 0.3870927 progressive 2.2749194
## 2 random 1.5890745 progressive 2.2996638
## 3 random 2.9274948 progressive 1.9313954
## 4 bohb 5.7753986 progressive 1.5170413
## 5 bohb 6.4220781 tournament 0.5100007
## 6 random 2.9316765 progressive 0.4911047
## yval
## 1 -0.4989768
## 2 -0.5345810
## 3 -0.5401640
## 4 -0.4748074
## 5 -0.5058519
## 6 -0.5397687
str(smashy_lcbench)
## 'data.frame': 10712 obs. of 12 variables:
## $ budget_log_step : num 0.1145 -0.4292 0.0482 0.8538 -1.4559 ...
## $ survival_fraction : num 0.261 0.3376 0.0149 0.7322 0.8552 ...
## $ surrogate_learner : Factor w/ 4 levels "bohblrn","knn1",..: 3 3 3 1 3 2 1 2 3 3 ...
## $ filter_with_max_budget : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 2 2 1 1 1 2 1 ...
## $ filter_factor_first : num 0.234 3.756 1.002 0.437 0.672 ...
## $ random_interleave_fraction : num 0.225 0.104 0.542 0.489 0.516 ...
## $ random_interleave_random : Factor w/ 2 levels "FALSE","TRUE": 2 2 1 1 1 1 1 1 2 1 ...
## $ sample : Factor w/ 2 levels "bohb","random": 1 2 2 1 1 2 1 2 2 2 ...
## $ filter_factor_last : num 0.387 1.589 2.927 5.775 6.422 ...
## $ filter_algorithm : Factor w/ 2 levels "progressive",..: 1 1 1 1 2 1 1 1 2 2 ...
## $ filter_select_per_tournament: num 2.27 2.3 1.93 1.52 0.51 ...
## $ yval : num -0.499 -0.535 -0.54 -0.475 -0.506 ...
We want to look at the importance for the whole dataset (general case) and for the best configurations (top 20%).
plotImportance(task = task)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(task = taskBest)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
For the general case, sample is the most important hyperparameter. The random_interleave_random parameter has little importance. For the best configurations, filter_factor_first and filter_factor_last are the most important parameters. The sample parameter no longer has any importance. The ranking of the parameters has changed a lot, but the value of the importance measure has not changed a lot for the parameters except for sample.
After we have subdivided the data, we first look for structural changes.
summary(smashy_lcbench)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## Min. :-1.7528 Min. :0.0000686 bohblrn:1372 FALSE:4801
## 1st Qu.:-1.0795 1st Qu.:0.1877029 knn1 :3111 TRUE :5911
## Median :-0.4192 Median :0.3602689 knn7 :4803
## Mean :-0.3839 Mean :0.4179906 ranger :1426
## 3rd Qu.: 0.3110 3rd Qu.:0.6339252
## Max. : 1.0196 Max. :0.9998031
## filter_factor_first random_interleave_fraction random_interleave_random
## Min. :0.000763 Min. :0.0000227 FALSE:5008
## 1st Qu.:2.791122 1st Qu.:0.1496729 TRUE :5704
## Median :4.452371 Median :0.3419693
## Mean :4.139002 Mean :0.3893602
## 3rd Qu.:5.690380 3rd Qu.:0.6082803
## Max. :6.907525 Max. :0.9999744
## sample filter_factor_last filter_algorithm
## bohb :8763 Min. :0.000763 progressive:3882
## random:1949 1st Qu.:2.462215 tournament :6830
## Median :4.267029
## Mean :3.960315
## 3rd Qu.:5.569787
## Max. :6.907578
## filter_select_per_tournament yval
## Min. :0.001612 Min. :-0.9647
## 1st Qu.:1.000000 1st Qu.:-0.5923
## Median :1.000000 Median :-0.5377
## Mean :1.086512 Mean :-0.5646
## 3rd Qu.:1.228722 3rd Qu.:-0.5189
## Max. :2.397413 Max. :-0.4690
summary(lcbenchBest)
## budget_log_step survival_fraction surrogate_learner filter_with_max_budget
## Min. :-1.7503 Min. :0.000095 bohblrn: 130 FALSE: 731
## 1st Qu.:-1.0406 1st Qu.:0.170492 knn1 : 796 TRUE :1412
## Median :-0.3780 Median :0.332510 knn7 :1161
## Mean :-0.3321 Mean :0.381662 ranger : 56
## 3rd Qu.: 0.3890 3rd Qu.:0.523938
## Max. : 1.0195 Max. :0.999789
## filter_factor_first random_interleave_fraction random_interleave_random
## Min. :0.004248 Min. :0.0000964 FALSE:1020
## 1st Qu.:3.643269 1st Qu.:0.1208691 TRUE :1123
## Median :4.845318 Median :0.2392768
## Mean :4.546724 Mean :0.3170039
## 3rd Qu.:5.870564 3rd Qu.:0.4727989
## Max. :6.907525 Max. :0.9979292
## sample filter_factor_last filter_algorithm
## bohb :2130 Min. :0.004248 progressive: 798
## random: 13 1st Qu.:3.101750 tournament :1345
## Median :4.634717
## Mean :4.263191
## 3rd Qu.:5.721979
## Max. :6.907525
## filter_select_per_tournament yval
## Min. :0.002426 Min. :-0.5160
## 1st Qu.:1.000000 1st Qu.:-0.5126
## Median :1.000000 Median :-0.5082
## Mean :1.064477 Mean :-0.5047
## 3rd Qu.:1.101817 3rd Qu.:-0.4995
## Max. :2.396205 Max. :-0.4690
surrogate_learner: Many bohblrn and rangers were kicked out in disproportionate numbers. This could mean that these learner perform worse on average. filter_with_max_budget: In proportion more FALSE were filtered out. This could means that TRUE values perform better on average. We can see that only 13 rows of the the best 20% configurations have random sampling. The other (over 2100) instances have used Bohb sampling. That is also the reason why the parameter sample has no importance for the subdivided dataframe since there are barely configurations for “random” samples left
The hyperparameter will be examined in following sections more precise.
As we could find out, “sample” is the most important parameter in the full dataset. This parameter should have the right value for good performance. So, let’s look at the effect of the variables in a partial dependency diagram. We also check if the effect applies to all parameters. We can use a heatmap to get a quick overview of interactions. Values close to 1 have barealy an effect on the outcome.
plotPartialDependence(task, features = c("sample"), rug = FALSE, plotICE = FALSE)
subplot(
plotHeatmap(task, features = c("sample", "budget_log_step"), rug = FALSE),
plotHeatmap(task, features = c("sample", "survival_fraction"), rug = FALSE),
plotHeatmap(task, features = c("sample", "surrogate_learner"), rug = FALSE),
plotHeatmap(task, features = c("sample", "filter_with_max_budget"), rug = FALSE),
plotHeatmap(task, features = c("sample", "filter_factor_first"), rug = FALSE),
plotHeatmap(task, features = c("sample", "random_interleave_fraction"), rug = FALSE),
plotHeatmap(task, features = c("sample", "random_interleave_random"), rug = FALSE),
plotHeatmap(task, features = c("sample", "filter_factor_last"), rug = FALSE),
plotHeatmap(task, features = c("sample", "filter_algorithm"), rug = FALSE),
plotHeatmap(task, features = c("sample", "filter_select_per_tournament"), rug = FALSE),
nrows = 5,shareX = TRUE)
PDP: It can be seen that the target values for bohb samples lead always to better results on average than for random samples.
Heatmaps: Note that survival_fraciton and random_interleave_fraction may give better results if a lower value is chosen for their parameter. Also, the surrogate_learner knn1 and knn7 seem to give better results. On average, the bohb sample is better, but let’s look at the best results and the combination of their instances.
We want to look at only the best configurations and verify that mostly “bohb” samples occur. Therefore we split the data set into “bohb” and “random” samples.
random <- smashy_lcbench[smashy_lcbench$sample == "random",]
bohb <- smashy_lcbench[smashy_lcbench$sample == "bohb",]
randomSubset <- TaskRegr$new(id = "task_random", backend = random, target = "yval")
bohbSubset <- TaskRegr$new(id = "task_bohb", backend = bohb, target = "yval")
We do split the entire data set for the best configurations because we assume differences betwween “random” and “bohb” samples because many “random” were filtered out and the parameter lost a lot of importance. For these reasons, we split the data set and focus primarily on the Bohb sample in what follows. For the best 20% configurations we focus on bohb only.
Let’s check if there are differences in importance for the parameters in the random subset and the Bohb subset.
plotImportance(task = bohbSubset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(task = randomSubset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
The hyperparameter survival_fraction is the most important parameter. Also random_interleave_fraction has high importance for both subsets. The parameters filter_algorithm and random_interleave_random do not seem to be important at all.
Bohb Sample: The parameter budget_log_step is now more important. In the first plot, this parameter was not ranked that high. So we can assume that it is very important for this subset. The importance of the other parameters has not changed that much compared to the full data but the hyperparameter surrogate_learner and filter_with_max_budget are more important than for random samples.
Random Sample: It looks like the right Parameter configuration is more important in the bohb sample because The parameter Importance values are in general higher than in the bohb sample. The parameters filter_factor_last and filter_factor_first have a higher importance in the random sample.
We could see in the beginning that most of the good results were gained with bohb samples. That’s why we will focus on bohb samples only from now on. That is, we remove the 13 rows of “random” samples from the underlying data.
bohbBest <- bohb[bohb$yval >= quantile(bohb$yval, 0.8),]
bohbBestTask <- TaskRegr$new(id = "bohbBestTask", backend = bohbBest, target = "yval")
The survival_fraction parameter is the most important parameter for both samples of the entire data set. With a PDP, we can gain better insight into how the parameter should be configured.
plotPartialDependence(bohbSubset, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)
plotPartialDependence(randomSubset, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)
In general, lower values achieve better performance than higher values. For the “bohb” susbet, the best range seems to be between 0.15 and 0.6. This means that too low a value is not so good in this case. For the “random” subset it is almost monotonically decreasing, which means that lower values are always better.
A possibility to find reasons for the structure is to filter the data again. For this we can split the data according to the best 20% yval values of the bohb samples
plotPartialDependence(bohbBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE, gridsize = 20)
In this case, higher values seem to be somewhat better. This is surprising, since in the general case low values were more important. It could mean that with good configurations of other parameters, the survival_fraction parameter even gives better results when a high value is chosen. This could also explain the increase in the range between 0.5 and 0.75. Looking at the rug, we see that most configurations were made below 0.5 and the fewest configurations were made above 0.75. Because of the few configurations with high values, the effect of good performances in this range is less strong. In the range between 0.5 and 0.75, there are more configurations, which therefore have a greater impact on the average curve. However, the difference on the y-axis is only small and therefore it cannot be said that high values are better.
Another important parameter for bohb subset is the surrogate_learner.
plotPartialDependence(bohbSubset, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)
In this diagram, knn1 and knn7 seem to be the best choices based on the results so far. For a more detailed analysis, we should divide the data into the individual surrogate_learners again and check if there are difference in the importance of the remaining parameters.
knn1Surrogate <- bohb[bohb$surrogate_learner == "knn1",]
knn7Surrogate <- bohb[bohb$surrogate_learner == "knn7",]
bohblrnSurrogate <- bohb[bohb$surrogate_learner == "bohblrn",]
rangerSurrogate <- bohb[bohb$surrogate_learner == "ranger",]
knn1Subset <- TaskRegr$new(id = "task", backend = knn1Surrogate, target = "yval")
knn7Subset <- TaskRegr$new(id = "task", backend = knn7Surrogate, target = "yval")
bohblrnSubset <- TaskRegr$new(id = "task", backend = bohblrnSurrogate, target = "yval")
rangerSubset <- TaskRegr$new(id = "task", backend = rangerSurrogate, target = "yval")
plotImportance(knn1Subset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(knn7Subset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(bohblrnSubset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
plotImportance(rangerSubset)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
The parameter survival_fraction is very important for the bohblrn and knn1 Subset. This could already be seen in the pdp for survival_fraction. The hyperparameter random_interleave_fraction has high importance for all surrogate_learners. For knn7 budget_log_step seems to be more important than for other surrogate_learner. To check why the importance differs and whether the parameters have different “good” ranges, let’s take a closer look at 3 very important parameters. We use ICE curves here to gain further insight. Later we check each factor separately for the top 20% of the configuration to find differences.
plotPartialDependence(knn1Subset, "random_interleave_fraction")
plotPartialDependence(knn7Subset, "random_interleave_fraction")
plotPartialDependence(bohblrnSubset, "random_interleave_fraction")
plotPartialDependence(rangerSubset, "random_interleave_fraction")
For knn1, lower random_interleave_fraction values seem to be better. For knn7 and bohblrn, the random_interleave_fraction values should be neither too high nor too low, and for ranger, higher values lead to better yval results. A good range for bohblrn seems to be between 0.05 and 0.65. For knn1 a value between 0.05 and 0.5 seems good. A good range for knn7 seems to be between 0.25 and 0.75
plotPartialDependence(knn1Subset, "survival_fraction")
plotPartialDependence(knn7Subset, "survival_fraction")
plotPartialDependence(bohblrnSubset, "survival_fraction")
plotPartialDependence(rangerSubset, "survival_fraction")
Low value for survival_fraction are better in general and could be set to under 0.5 but high values are worst for the “boblrn”. For the surrogate_learner “knn7” values around 0.5 seems to produce best performanes for all other factors values under 0.5 are better. For the factor “knn1” a good choice is between 0.1 and 0.6.
plotPartialDependence(knn1Subset, "budget_log_step", gridsize = 40)
plotPartialDependence(knn7Subset, "budget_log_step", gridsize = 40)
plotPartialDependence(bohblrnSubset, "budget_log_step")
plotPartialDependence(rangerSubset, "budget_log_step")
It is very interesting that the line for the parameter budget_log_step shows repeated dips. It is only for knn7 and knn1. The range is hard to identify since it also depends on the gridsize of the plot. It can be said that a value over -0.5 is a good choice knn7 and ranger. For bohb no suggestion is made because of repeated dips. For knn1 and knn7 values bewteen -0.5 and 1 seems to archieve good results.
We also want to investige the best cases and for this directly check the subdivided datasets.
plotPartialDependence(taskBest, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)
the surrogate_learner “bohblrn” is now most important, and “ranger” is cleary more important now.
Lets investigate the surprising outcome of surrogate_learner class bohblrn
bohblrnBest <- bohbBest[bohbBest$surrogate_learner == "bohblrn",]
bohblrnTaskBest <- TaskRegr$new(id = "task", backend = bohblrnBest, target = "yval")
plotParallelCoordinate(bohblrnTaskBest, labelangle = 10)
plotImportance(bohblrnTaskBest)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
PCP: A high value for filter_factor_last value could be better since there a lot of lines + high yval. The filter_with_max_budget parameter should be set to “TRUE” and the parameter filter_algorithm should be set to “progressive”. It looks like high budget_log_step brings best results.
Importance Plot: In the genereal case for bohblrn survival_fraction was most important (by far!), now it is budget_log_step and filter_with_max_budget.
Let’s investigae why survival_fraction lost in importance.
plotPartialDependence(bohblrnTaskBest, "survival_fraction")
plotPartialDependence(bohblrnSubset, "survival_fraction")
Before a high survival_fraction led to a drop, but one can see that it doesnt effect very good results! Here we can see why as an addition to the PDP, ICE Curves can be useful as well.
Let us observe the other impotant parameter from PCP and important plot for the “bohblrn” of surrogate_learner.
plotPartialDependence(bohblrnTaskBest, "budget_log_step", gridsize = 30)
plotPartialDependence(bohblrnTaskBest, "filter_with_max_budget")
plotPartialDependence(bohblrnTaskBest, "filter_factor_last")
plotPartialDependence(bohblrnTaskBest, "filter_algorithm")
In general budget_log_step perform better with higher values. Worse prediction do barely increase with a higher value. There are also little drops around -0.3 to 0.5
Filter_with_max_budget should be set to TRUE. there are more observations than in FALSE. In proportion, more FALSE have already been thrown out and therefore this is another indication that TRUE is the choice for better yval.
The Parameter filter_factor_last high values could perform best results but the differences are low.
The thesis that filter_algorithm should be “progressive” cannot be confirmed.
Lets investigate the surprising outcome of surrogate_learner class bohblrn
knn1Best <- bohbBest[bohbBest$surrogate_learner == "knn1",]
knn1BestTaskBest <- TaskRegr$new(id = "task", backend = knn1Best, target = "yval")
plotParallelCoordinate(knn1BestTaskBest, labelangle = 10)
plotImportance(knn1BestTaskBest)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
PCP: The parameter filter_with_max_budget should set to “TRUE”. It looks like there a specific areas for budget_log_step which brings better results. The hyperparameter survival_fraction should be high and the parameter random_interleave_fraction should be low for good results. High filter_factor_last values could be better since there a lot of lines + results in high yval values. The parameter filter_select_per_tournament should be 1
Importance Plot: The paramter filter_factor_first and survival_fraction and filter_factor_last. are most important according to importance plot.
The interesting parameter according to PCP and importance plots should be examined.
plotPartialDependence(knn1BestTaskBest, "filter_factor_first" )
plotPartialDependence(knn1BestTaskBest, "survival_fraction")
plotPartialDependence(knn1BestTaskBest, "filter_factor_last")
plotPartialDependence(knn1BestTaskBest, "filter_with_max_budget")
plotPartialDependence(knn1BestTaskBest, "budget_log_step")
plotPartialDependence(knn1BestTaskBest, "filter_select_per_tournament")
plotPartialDependence(knn1BestTaskBest, "random_interleave_fraction")
In General the parameter filter_factor_first seems to produce better results in low areas. But best results are in area under 4. The variable survival_fraction should get a vlue over 0.5 (interesting because in the general case lowe values were better!). The hyperparameter filter_factor_last and random_interleave_fraction doesn’t really tell us where the best configurations are.
knn7Best <- bohbBest[bohbBest$surrogate_learner == "knn7",]
knn7BestTaskBest <- TaskRegr$new(id = "task", backend = knn7Best, target = "yval")
plotParallelCoordinate(knn7BestTaskBest, labelangle = 10)
plotImportance(knn7BestTaskBest)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
PCP: filter_algorithm should be “tournament”. filter_factor_first should be around 4. random_interleave_random should be FALSE. survival_fraction seems to be a low. filter_with_max_budget should be TRUE. random_interleave_fraction should be low and filter_select_per_tournament should be around 1.
Importance Plot: The most important parameters are filter_factor_first, filter_factor_last and budget_log_step
plotPartialDependence(knn7BestTaskBest, "filter_factor_first" )
plotPartialDependence(knn7BestTaskBest, "filter_factor_last")
plotPartialDependence(knn7BestTaskBest, "budget_log_step")
plotPartialDependence(knn7BestTaskBest, "filter_algorithm")
plotPartialDependence(knn7BestTaskBest, "random_interleave_random")
plotPartialDependence(knn7BestTaskBest, "survival_fraction")
plotPartialDependence(knn7BestTaskBest, "random_interleave_fraction")
plotPartialDependence(knn7BestTaskBest, "filter_select_per_tournament")
plotPartialDependence(knn7BestTaskBest, "filter_with_max_budget")
Filter_factor_First should be under 4, budget_log_step produces best values over 0.5 but has not a big impact in general. Again, we don’t see the perfect range for filter_factor_last and random_interleave_fraction. And we do not can not bestätigen for sure that “tournament” are always better. random_interleave_random should be FALSE. filter_select_per_tournament should be over 0.5. filter_with_max_budget should be TRUE.
Finally, the ranger should be investigated since the average performance for good configurations increased a lot.
rangerBest <- bohbBest[bohbBest$surrogate_learner == "ranger",]
rangerBestTaskBest <- TaskRegr$new(id = "task", backend = rangerBest, target = "yval")
plotParallelCoordinate(rangerBestTaskBest, labelangle = 10)
plotImportance(rangerBestTaskBest)
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
PCP: budget_log_step should be high. filter_with_max_budget should be TRUE.
Importance Plot: The most important parameters are filter_factor_first, filter_with_max_budget and budget_log_step.
plotPartialDependence(rangerBestTaskBest, "filter_factor_first")
plotPartialDependence(rangerBestTaskBest, "budget_log_step")
plotPartialDependence(rangerBestTaskBest, "filter_with_max_budget")
A high budget_log_step and a low filter_factor_first seems produce best performance. For budget_log_step a value over -0.5 seems to be good, for filter_factor_first a value under 2.5 performs best. It needs to be noticed that only around 45 observations are left and so the intepretation is not that clear. The parameter filter_with_max_budget should set to TRUE.
Another important parameter for the bohb samples is the budget_log_step parameter. Let’s have a look on the PDP.
plotPartialDependence(bohbSubset,"budget_log_step", gridsize = 20)
plotPartialDependence(bohbBestTask, features = c("budget_log_step"), rug = FALSE, gridsize = 20)
In General the value for budget_log_step should be over -0.5. A high value seems a good choice in the subdivided data set.
Random_interleave_fraction can vary between 0 and 1. This parameter had a high performance in the bohb sample and in the random sample. Slighty more important in random sample. Let check this parameter.
plotPartialDependence(bohbSubset, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 15)
plotPartialDependence(randomSubset, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 15)
For the random_interleave_fraction and the “bohb” sample a good choice is a value which is not too high or too low since they give worst performances. a good value seems to be between 0.1 and 0.7 . For the “random” sample low values bring better performances here.
plotPartialDependence(bohbBestTask, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 20)
In the upper case, there is no bad area at the edges.
The parameter filter_factor_last was less important but a little check is good as well.
plotPartialDependence(bohbSubset, "filter_factor_last")
plotPartialDependence(bohbBestTask, features = c("filter_factor_last"), rug = FALSE, gridsize = 10)
### {.unlisted .unnumbered}
The effect is low and should be only chosen according to the surrogate_learner.
plotPartialDependence(bohbSubset, features = c("filter_with_max_budget"), rug = FALSE, gridsize = 10)
plotPartialDependence(taskBest, features = c("filter_with_max_budget"), rug = FALSE, gridsize = 10)
The parameter filter_with_max_budget has a weak effect but should be set to “TRUE”.
This parameter had barely an effect on the general case but got a little more important in the top 20% configurations. We check the partial dependence and the dependencies with the most important parameters to get more insight.
plotPartialDependence(taskBest, features = c("filter_select_per_tournament"), rug = FALSE, gridsize = 10)
plotPartialDependence(taskBest, features = c("filter_select_per_tournament", "survival_fraction"), rug = FALSE, gridsize = 10)
plotPartialDependence(taskBest, features = c("filter_select_per_tournament", "filter_factor_first"), rug = FALSE, gridsize = 10)
plotPartialDependence(taskBest, features = c("filter_select_per_tournament", "filter_factor_last"), rug = FALSE, gridsize = 10)
The effect is weak and maybe comes from the peaks around 1. The parameter should be probably choosen between 1 or slightly better but the effect shouldn’t effect much.
Filter_factor_first was a very high ranked parameter in the parmameter importance for top configurations.
plotPartialDependence(taskBest, features = c("filter_factor_first"), rug = FALSE, gridsize = 20)
plotPartialDependence(taskBest, features = c("filter_factor_first", "filter_factor_last"), rug = FALSE, gridsize = 10)
plotPartialDependence(taskBest, features = c("filter_factor_first", "survival_fraction"), rug = FALSE, gridsize = 10)
plotPartialDependence(taskBest, features = c("filter_factor_first", "budget_log_step"), rug = FALSE, gridsize = 10)
In General lower values for filter_factor_first archieve better performance. In Appearance with filter_factor_last both parameter should be minimum. A good choice seems a value under 4.